Conversation
425a818 to
8af04f0
Compare
567217e to
5c7a71a
Compare
| from ..pipeline_config import PipelineGeneratorConfig | ||
|
|
||
|
|
||
| def should_run_step(test_step: TestStep, config: PipelineGeneratorConfig) -> bool: |
There was a problem hiding this comment.
would love to see if we can separate test selection logic for "pipeline" and "step"
|
I'm sorry but this PR is too big to review in 1 go. Can we walk through the design together & break this down to multiple PRs? |
5c7a71a to
ba64a8f
Compare
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Simplify folder structure Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Less magic strings Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Update readme Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Deduplicate CI and fastcheck Update readme Lint and formatting Minimal version
8091c94 to
e542cce
Compare
|
This PR was updated to significantly reduce the complexity and SLOC |
khluu
left a comment
There was a problem hiding this comment.
Sorry I tried to review but this is still too long & way too verbose. Let me see what I can do to make a few jobs more declarative & come up with a design for this pipeline generator.
Also, we should have more unit tests, not just integration test, and we should not just compare the result to jinja. When this is put in use, jinja will be deprecated anyway so the logic won't be updated across 2 generators.
| @property | ||
| def container_image_cu118(self): | ||
| """Get the CUDA 11.8 container image.""" | ||
| return f"public.ecr.aws/q9t5s3a7/vllm-ci-{self._get_repo_suffix()}-repo:$BUILDKITE_COMMIT-cu118" |
There was a problem hiding this comment.
you should be able to reuse a constant var for public.ecr.aws/q9t5s3a7 here and have vllm-ci-test-repo or vllm-ci-postmerge.. to be a different constant, not by suffix
There was a problem hiding this comment.
the reason is we don't always rely on suffix
| """Get repository suffix based on branch (postmerge for main, test otherwise).""" | ||
| return "postmerge" if self.branch == "main" else "test" | ||
|
|
||
| @property |
There was a problem hiding this comment.
I feel like all of these property funcs can be merged in one func get_container_image(..) so that when you change the logic, it's easier when everything is in 1 place
|
|
||
| # ECR and Images | ||
| VLLM_ECR_URL = "public.ecr.aws/q9t5s3a7" | ||
| VLLM_ECR_REPO = f"{VLLM_ECR_URL}/vllm-ci-test-repo" |
There was a problem hiding this comment.
this can just be vllm-ci-test-repo
| BUILD_KEY_AMD = "amd-build" | ||
| BUILD_KEY_TORCH_NIGHTLY = "image-build-torch-nightly" | ||
|
|
||
| # Agent Queues |
There was a problem hiding this comment.
can we convert this to an enum class?
| # ============================================================================== | ||
|
|
||
|
|
||
| def generate_hardware_tests(config) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
I think this whole function is too verbose.. Step should be its own class/abstraction.
This continues the work on the pipeline-generator. The new code will act as a full drop-in replacement for the jinja templates.
Currently the test-pipeline and fastcheck have been tested.
For more details read the README.MD files
This helps us in a few ways:
1-Python is easier to write and review
2-We added a lot of unit tests so changes in ci-infra are less likely to break the entire CI
3-Supports both “CI” and “Fastcheck” mode currently
4-The code was written with github actions in mind so it should be able to help with GHA migration
Testing
We have some unit tests that run as part of the CI. But more importantly we have two integration tests:
buildkite/pipeline_generator/tests/test_integration_comprehensive.py
buildkite/pipeline_generator/tests/test_integration_fastcheck.py
These tests will compare the generated yaml trees between jinja and pipeline-generator and ensure they are 100% equal under various scenarios.